Learning Dual Semantic Relations With Graph Attention for Image-Text Matching

نویسندگان

چکیده

Image-Text Matching is one major task in cross-modal information processing. The main challenge to learn the unified visual and textual representations. Previous methods that perform well on this primarily focus not only alignment between region features images corresponding words sentences, but also relations of regions relational words. However, lack joint learning regional global will cause lose contact with context, leading mismatch those non-object which have meanings some sentences. In work, order alleviate issue, it necessary enhance concepts obtain a more accurate representation so as be better correlated text. Thus, novel multi-level semantic enhancement approach named Dual Semantic Relations Attention Network(DSRAN) proposed mainly consists two modules, separate module module. DSRAN performs graph attention both modules respectively for region-level regional-global at same time. With these different hierarchies are learned simultaneously, thus promoting image-text matching process by providing final representation. Quantitative experimental results been performed MS-COCO Flickr30K our method outperforms previous approaches large margin due effectiveness dual scheme. Codes available https://github.com/kywen1119/DSRAN.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stacked Cross Attention for Image-Text Matching

In this paper, we study the problem of image-text matching. Inferring the latent semantic alignment between objects or other salient stuffs (e.g. snow, sky, lawn) and the corresponding words in sentences allows to capture fine-grained interplay between vision and language, and makes image-text matching more interpretable. Prior works either simply aggregate the similarity of all possible pairs ...

متن کامل

Image Caption Generation with Text-Conditional Semantic Attention

Attention mechanisms have attracted considerable interest in image captioning due to its powerful performance. However, existing methods use only visual content as attention and whether textual context can improve attention in image captioning remains unsolved. To explore this problem, we propose a novel attention mechanism, called textconditional attention, which allows the caption generator t...

متن کامل

Analysing Image-Text Relations for Semantic Media Adaptation and Personalisation

Progress in semantic media adaptation and personalisation requires that we know more about how different media types, such as texts and images, work together in multimedia communication. To this end, we present our ongoing investigation into image-text relations. Our idea is that the ways in which the meanings of images and texts relate in multimodal documents, such as web pages, can be classif...

متن کامل

Joint Semantic Relevance Learning with Text Data and Graph Knowledge

Inferring semantic relevance among entities (e.g., entries of Wikipedia) is important and challenging. According to the information resources, the inference can be categorized into learning with either raw text data, or labeled text data (e.g., wiki page), or graph knowledge (e.g, WordNet). Although graph knowledge tends to be more reliable, text data is much less costly and offers a better cov...

متن کامل

Learning Two-Branch Neural Networks for Image-Text Matching Tasks

Image-language matching tasks have recently attracted a lot of attention in the computer vision field. These tasks include image-sentence matching, i.e., given an image query, retrieving relevant sentences and vice versa, and region-phrase matching or visual grounding, i.e., matching a phrase to relevant regions. This paper investigates two-branch neural networks for learning the similarity bet...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Circuits and Systems for Video Technology

سال: 2021

ISSN: ['1051-8215', '1558-2205']

DOI: https://doi.org/10.1109/tcsvt.2020.3030656